Skip to content

Allow using mean read length in FASTQ-trained read simulation#4855

Open
faithokamoto wants to merge 2 commits intomasterfrom
sim-avg-len
Open

Allow using mean read length in FASTQ-trained read simulation#4855
faithokamoto wants to merge 2 commits intomasterfrom
sim-avg-len

Conversation

@faithokamoto
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • Add vg sim --use-average-length option

Description

When vg sim -F is given a FASTQ file to try to match, it auto-detects a read length and then simulates a bunch of reads of that length. The current auto-detection logic uses the mode, i.e. the most common read length. While that works fine for short reads where the common read length is probably the target, for long reads it is problematic. This PR adds a --use-average-length/-L option which tells the sampler to calculate an average read length and use that instead.

The ideal state would probably be simulating from a read length distribution. However, we're only set up to simulate a single read length at a time, and since I'm currently doing HiFi reads (not nanopore) this is good enough for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants